OCR and OpenAI Processes OCR and OpenAI Processes Architecture Architecture Document Intelligence - Classification Document Intelligence - Classification Manual classification Manual classification File Architecture File Architecture Process flow Process flow Decision Tree Decision Tree Phase Phase Databricks – SQL Warehouse Databricks – SQL Warehouse External/Output External/Output OCR - Code OCR - Code Azure Resource Azure Resource Web Page Web Page Sample - uncluster Sample - uncluster Sample - Label Sample - Label Schema1 Schema1 Schema2 Schema2 Schema... Schema... Schema28 Schema28 Population - uncluster Population - uncluster Population - cluster Population - cluster Extraction Model Extraction Model Unknown Unknown Template 2 Template 2 Template 3 Template 3 Template 4 Template 4 Template 1 Template 1 Population - Cluster Population - Cluster Schema1 Schema1 Schema2 Schema2 Schema... Schema... Schema43 Schema43 Sample - Label Sample - Label Schema1 Schema1 Schema2 Schema2 Schema... Schema... Schema43 Schema43 Population - Cluster Population - Cluster Schema1 Schema1 Schema2 Schema2 Schema... Schema... Schema43 Schema43 Population - uncluster Population - uncluster template1 template1 template2 template2 template99 template99 template1 template1 template2 template2 template99 template99 trainingsamples trainingsamples population population pdf pdf websiteinfo websiteinfo json json jsondata jsondata jsonmodified jsonmodified json json pdf pdf jsondata jsondata jsonresponses jsonresponses Storage Account Storage Account OpenAI OpenAI Document Intelligence Script Document Intelligence Script Delta Migration Delta Migration SQL Warehouse SQL Warehouse QA/QC QA/QC Prompts Prompts Same as PDFs Same as PDFs Each document has a unique document ID from Project ID extrac... Each document has a unique document ID from Project ID extracted by OpenAI Same names as PDFs Same names as PDFs Unique document names Unique document names Population - cluster Population - cluster Unknown Unknown Template 2 Template 2 Template 3 Template 3 Template 4 Template 4 Template 1 Template 1 Extraction Model Extraction Model websiteinfo websiteinfo OpenAI Script OpenAI Script OpenAI – Code OpenAI – Code jsondata jsondata jsonmodified (Clone) jsonmodified(Clone) 1:1 1:1 Reads/writes Reads/writes overwritten overwritten Listener 1:1 Listener1:1 Field: verified (yes/no) Field: verified (yes/no) Field: verified (yes/no) Field: verified (yes/no) Business Intelligence Tools Business Intelligence Tools Frequent access Frequentaccess Table 2 Table 2 Table1 Table1 Table 3 Table 3 Raw PDFs Raw PDFs Document Intelligence Document Intelligence creates creates access access access access Step 2: Train Classification Model Step 2: Train Classification Model Step 1: Identify minimum of 10 samples for each schema Step 1: Identify minimum of 10 samples for each schema Step 2b: Upload Step 2b: Upload Within each model, select required fields Within each model, select required fields Step 4: Run Extraction Model Step 4: Run Extraction Model Step 3: Run Classification Model Step 3: Run Classification Model Step 1: Manually Figure out Schemas/Formats Step 1: Manually Figure out Schemas/Formats Step 2: Train Classification Model Step 2: Train Classification Model Step 2b: Upload Step 2b: Upload Step 3: Run Classification Model Step 3: Run Classification Model OCR OCR OpenAI OpenAI Document type Document type Azure Blob Storage Account Container Azure Blob Storage Account Container Databricks Databricks SQL Warehouse SQL Warehouse Raw PDFs Raw PDFs jsondata jsondata Event Grid Event Grid Trigger on new PDFs uploaded Trigger on new PDFs uploaded creates creates jsonresponses jsonresponses creates creates WebApps WebApps jsonmodified jsonmodified references references updates updates Manual clone Manual clone OpenAI (SSC) OpenAI (SSC) ... ... ... ... ... ... ... ... ... ... Within each model, select required fields Within each model, select required fields Step 4: Run Extraction Model Step 4: Run Extraction Model Want to perform analysis on data. Want to perform analysis on data. Is the document digitized? Is the document digitized? What is the document format? What is the document format? Not a supported format. Research phase to convert videos into... Not a supported format. Research phase to convert videos into frames Video Video Format is supported; still research phase Format is supported; still research phase Image Image Is OpenAI integration required? Is OpenAI integration required? Text Text Access Data in SQL Warehouse Access Data in SQL Warehouse No No Is source data Protected-B or above? Is source data Protected-B or above? Yes Yes OpenAI not cleared by IT-SEC to use sensitive data OpenAI not cleared by IT-SEC to use sensitive data Yes Yes Process large amount of documents with OpenAI? Process large amount of documents with OpenAI? No No OpenAI web-chatbot Uses web interface to ask individual quest... OpenAI web-chatbot Uses web interface to ask individual questions for a given document. No No Are prompts pre-determined? Are prompts pre-determined? OpenAI-Script can group large amounts of documents for a give... OpenAI-Script can group large amounts of documents for a given set of prompts and process all documents at once. OpenAI WebApp uses web interface to upload a document with a ... OpenAI WebApp uses web interface to upload a document with a set of prompts and get OpenAI responces FoSx-SP-Waayback-LethalIndigowingedparrot FoSx-SP-Waayback-LethalIndigowingedparrot PSSI-OpenAI-RG PSSI-OpenAI-RG Storage Account Storage Account (pssidatalake) (pssidatalake) Web App Web App (pssi-openAI-prompts) (pssi-openAI-prompts) Document Intelligence Document Intelligence (pssi-prebult-models) (pssi-prebult-models) Data Factory Data Factory (pssi-pipelines) (pssi-pipelines) Databricks Databricks (pssi-openai-databricks) (pssi-openai-databricks) Web App Web App (pssi-prd-pstb-rcoe-gc) (pssi-prd-pstb-rcoe-gc) Web App Web App (pssi-prd-emb-ispe-planningliterature) (pssi-prd-emb-ispe-planningliterature) SSC Directory SSC Directory DFO Directory DFO Directory SQL Warehouse SQL Warehouse (emb-ipse) (emb-ipse) SQL Warehouse SQL Warehouse (pstb-rcoe) (pstb-rcoe) Azure OpenAI Azure OpenAI (pssi-openai) (pssi-openai) SC2G - PROD ProB SC2G - PROD ProB EDH-PSSI-PROD-RG EDH-PSSI-PROD-RG Storage Account Storage Account (stpssiprd) (stpssiprd) Document Intelligence Document Intelligence (pssi-doc-ai-prd) (pssi-doc-ai-prd) Data Factory Data Factory (adfpsssiprdinnovation) (adfpsssiprdinnovation) Web App Web App (pssi-openAI-chatbot) (pssi-openAI-chatbot) EDH-PROD-RG EDH-PROD-RG Blob Container Blob Container (fm-rec-fishslips) (fm-rec-fishslips) Blob Container Blob Container (science-stockassesment-sil) (science-stockassesment-sil) Blob Container Blob Container (rm-dml-licenses-logbooks) (rm-dml-licenses-logbooks) SQL Warehouse SQL Warehouse (science-stockassesment) (science-stockassesment) SQL Warehouse SQL Warehouse (emb-ffhpp) (emb-ffhpp) SQL Warehouse SQL Warehouse (rm-dml) (rm-dml) SQL Warehouse SQL Warehouse (fm-rec) (fm-rec) Note: Document intelligence currently does not have the newes... Note: Document intelligence currently does not have the newest API avalible for region "Central Canada" which outputs a confidence score for extracted table data. Note: Document intelligence currently does not have the newes... Note: Document intelligence currently does not have the newest API avalible for region "Central Canada" which outputs a confidence score for extracted table data.Note: OpenAI cannot be deployed in the DFO directory hence all of the OpenAI related instances in SSC Missing: Access to Databricks in EDH-PROD-RG with permissions... Missing: Access to Databricks in EDH-PROD-RG with permissions to interact with Document Intelligence, Data Factory, and Storage Account from EDH-PSSI-PROD-RGMissing: Instances of SQL Warehouses for each project having data processed in EDH-PSSI-PROD SQL Warehouse SQL Warehouse (qcfm-rec) (qcfm-rec) Databricks Databricks Web App Web App (pssi-prd-ocr-qcfm-rec) (pssi-prd-ocr-qcfm-rec) Web App Web App (pssi-prd-ocr-emb-ffhpp) (pssi-prd-ocr-emb-ffhpp) Web App Web App (pssi-prd-ocr-fm-rec) (pssi-prd-ocr-fm-rec) Web App Web App (pssi-prd-ocr-science-stockassesment) (pssi-prd-ocr-science-stockassesment) Blob Container Blob Container (emb-ffhpp-complicance-inspectionreports) (emb-ffhpp-complicance-inspectionreports) Blob Container Blob Container (qcfm-rec-seaobserver-logbooks-dockside-purchaseslips) (qcfm-rec-seaobserver-logbooks-dockside-purchaseslips) Blob Container Blob Container (emb-ipse-planningliterature) (emb-ipse-planningliterature) Blob Container Blob Container (pstb-rcoe-gc) (pstb-rcoe-gc) creates creates access access jsondata jsondata creates creates references references More than 1 template? More than 1 template? Document Intelligence Classification & Extraction Process (Cu... Document IntelligenceClassification & Extraction Process (Custom) Yes Yes QA/QC Website interface QA/QC Website interface Digitized Digitized Document Intelligence Extraction (Pre-build) Document Intelligence Extraction (Pre-build) No No No No Yes Yes yes yes no no yes yes Legend: - In house Application - SAS Legend: - In house Application- SAS